Skip to content

Conversation

@alexlivekit
Copy link
Contributor

See this thread for more details. Main point:

When we authenticate for the purposes of a SIP session (when credentials are not being cached), the following happens:

  • client -> livekit : I want to start a call (INVITE)
  • client <- livekit : You need to authenticate (SIP-407 Unauthorized).
    • This may count as a failure.
    • The 407 response contains an auth challenge.
  • client -> livekit : I want to start a call (INVITE), this time with the credentials.
    • This call has the exact same callid as the first one.
    • It also contains the auth response.
  • client -> livekit : credentials look good, I'll forward your call request (SIP-100 Trying).

If we happen to count the 407 response as a failed call, and we generate unique SCL_* ids for both.

@alexlivekit alexlivekit requested a review from a team as a code owner October 20, 2025 21:30
inviteOKRetryAttemptsNoACK = 2
inviteOkAckLateTimeout = inviteOkRetryIntervalMax

inviteCredentialValidity = 60 * time.Minute // Allow reuse of credentials for 1h
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is smart, might want to extend this to max_call_duration or something.

Also, keep in mind that this is per Call-ID for now, so new calls would still need re-auth.

tr := callTransportFromReq(req)
legTr := legTransportFromReq(req)
log := s.log.WithValues(
"callID", callID,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one log line that will not have callID now is "Bad request", when validation fails.

@alexlivekit alexlivekit changed the title Reuse the same ID for both auth-less and auth-ful INVITEs TEL-222: Reuse the same ID for both auth-less and auth-ful INVITEs Oct 20, 2025
if err != nil {
return nil, fmt.Errorf("invalid challenge %q: %w", challengeStr, err)
}
toHeader := resp.To()
Copy link
Contributor Author

@alexlivekit alexlivekit Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we're doing the right validation on the inbound side, the outbound side E2E test caught this error!
But this also means out clients might run into the same issue, in case some of them are not spec-compliant.

s.inProgressInvites[key] = is

go func() {
time.Sleep(inviteCredentialValidity)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better avoid spawning goroutines that wait for the whole hour without a clear cancellation signal.

Usually, you'd have one goroutine periodically cleaning the expired cache. time.AfterFunc could work, but again, there's no real need to create thousands of timers all waiting for an hour.

log: log,
s: s,
id: id,
id: "unassigned",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be helpful if it logs/panics if something tries to read this unassigned ID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not add shims or verification for this. Unless it's real clean, I'd rather leave it blank. Would that be better in your opinion?

@alexlivekit alexlivekit changed the title TEL-222: Reuse the same ID for both auth-less and auth-ful INVITEs Reuse the same ID for both auth-less and auth-ful INVITEs Oct 21, 2025
@alexlivekit alexlivekit marked this pull request as draft October 22, 2025 23:33
@alexlivekit alexlivekit marked this pull request as ready for review October 23, 2025 07:45
toTag: toTag,
fromTag: fromTag,
}
s.imu.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A common practice to avoid lock contention is to use RWMutex and doing a two stage lock:

s.imu.RLock()
is, ok := s.inProgressInvites[key]
s.imu.RUnlock()
if ok {
   return is
}
s.imu.Lock()
defer s.imu.Unlock()
is, ok := s.inProgressInvites[key]
if ok {
   return is
}
// ... the rest ...

This allows multiple readers to get the invite state without blocking each other. Also notice that we redo the check after getting a write lock - other routine might create the state earlier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern:

  • Adds a few lines of code - maintenance complexity over time.
  • Adds a few operations - Every first invite will have to go through two lock/unlock cycles (userspace, but still not 0us).
  • Optimization? - Does it save time in this case? Sipgo de-dupes retransmissions of all requests, and in-dialog INVITEs are not possible before we conclude the initial invite. So the only possible race condition here is two separate re-INVITE requests that are not retransmissions. In that case, the flow is lock -> check exists -> unlock -> return existing with or without this pattern. What am I missing here?
  • Consistency - If we just always do it that way, that make a solid practice and approach.

My initial approach was "keep it simple", I'll happily change that for consistency's sake, but I would love to make sure I understand the performance aspect of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants